1 Introduction

The United States of America is a relatively new country. It has only been around for a little over 200 years. During this period of time, it has been through a number of very different phases that have led the USA to become a major global power: from early expansionism, civil war and industrialization, world wars, cold war, and modern America.

The question we ask in this study is: how has the USA’s evolution is history seems reflected on its leaders’ speeches? For this, we analize the State of the Union speeches that are given yearly since 1790. Our objective is to get insights about the general sentiment, as well as the relevance and sentiment towards specific entities related to each period. Our main assumption is that a leader’s speech will reflect the overall nation’s sentiment.

To do this, we divided our analysis in 6 different periods that we consider characterize the history of the United States. The periods are the following:

  1. (1789-1861) A New Nation: This period was characterized by territorial expansion and massive immigration from Europe and Asia. Most of the states from present USA, were incorporated in this period. In general, it was a prosperous era, ended by the worst conflict the country has yet seen: The Civil War.

  2. (1861 - 1913) Civil War and Industrialization: This period starts with the Civil War (1861-1865) and continues with the late XIX and early XX centuries. Great technological changes occurred in this period. Electricity was invented, and transportation was revolutioned.

  3. (1914 - 1945) World Wars: Enormous changes occurred worldwide during this period. Two major wars and the greatest economic depression shaped the modern world as we know it. USA, together with the Soviet Union reaffirmed their position as major world powers.

  4. (1945 - 1989) Cold War: This was a period of big tensions between countries. Some conflicts arose suh as Korean War and Vietnam. Economically speaking, there was tremendous growth and innovation. Internet and space exploration started. The end of this era comes with the fall of the Berlin Wall, and the subsequent fall of the Soviet Union.

  5. (1990 - 2001) Nineties: This was a period of economic prosperity in general. While the “communist threat” had vanished, a new “enemy” came into scene: Terrorists.

  6. (2002-2016) Modern America: The most recent years, there has been tremendous technological innovation. However, a major crisis and global terrorism have been a constant topic in mouth of everybody.

We divide our analysis in four parts:

  1. General sentiment analysis: We explore the sentiment of presidential speeches over time, and relate some of the observations to historical happennings.

  2. Targeted analysis: We breakdown the sentiment analysis into different entities, to observe which are the entities that are associated with more positive, negative, or fluctuating sentiments over time.

  3. Word analysis by party: We analyze how presidential speeches reflect fundamental differences between parties, and explore how these differences have evolved over time.

  4. Entity network analysis. We do a general overview of the relationship between entities that are relevant to the USA. We group such entities in several subgroups according to their network.

2 General Sentiment Analysis

We want to analyze the sentiment, which is the attitude, opinion or feeling, of speeches during those six periods by using Alchemy API. For example, if a speech contains pessimistic view of the economy, or expresss great concerns about a war, it is classified as negative sentiment. In contrast, if the speech shows optimistic attitudes on future economic situation, or reflects strong optimism about winning a war, it is classified it as positive sentiment.

In the above Speech Sentiment graph, we represents speeches in different colors according to the incumbent president’s political party: Following tradition, we plotted democrats in blue, republicans in red, and neither in black. The y axis represents the intensity of the sentiment, where zero represents neutral sentiment. It is important to note that we categorized presidents belonging to the historical “Whig” and “Republican-Democrat” parties as Republicans, while the “Federalists” where categorized as Democrats. This categorization was made according to the history of such parties.

Overall, more than 90% speeches are classified as positive sentiment. It is not difficult to explain because it is expected that presidents will avoid delivering negative speeches in public very frequently. Even during the hardest periods, such as great depression and world war II, the president has to encourage their citizens to raise moral.

Although very few speeches are classified as negative sentiment, it is possible to relate each one of them to big events in history. For example, those several speeches at the end of A New Nation period were correlated to the advent of the Civil War. Before the outbreak of the such event, the north and south had irreconcilable conflicts on some crucial issues such as slavery. The prospect of war hung over the entire country. The negative speeches in World Wars period are related to the great depression and the outbreak of the world war II. Lastly, those negative sentiment speeches from 2002 to the present are connected to 9/11, the Iraq war, and mostly to the 2008-2009 economic crisis.

3 Targeted Analysis

Next we focus on analyzing specific entities mentioned in speeches. We take four different criteria to select which words to visualize:

  1. Most frequent words.
  2. Largest sentiment fluctuation words.
  3. Words with most positive sentiment.
  4. Words with most negative sentiment.

We find some interesting results, for example, government has been regarded as one of the most negative entities. Another observation is that China has become the most positive entity in recent years. Also, the sentiment towards British government has been changing overtime due to the fluctuations in these two countries’ relationships. Similarly, many terms can be compared to see how their sentiment has changed over time

Let’s change to another perspective. We’ve plotted the histogram for relevance of top 20 most frequently used entities. The relevance score measures how frequent such topic is mentioned by a specific president.

From above we can observe that Congress is metioned by all presidents, as it’s the common practice in the address. Navy has barely mentioned since the Cold War. It may due to the fact that navy force development is not the top priority, since the U.S government was implementing the Star Wars Program.

4 Word Analysis by party

As a next section we added an analysis to compare republicans and democrats oer time. We compared, for each period, which were the most common words for republican and democrat presidents.

We calculated a “similarity score” between republican and democrat speechesfor each period. This score was calculated by computing the correlation between the vector of relative frequencies of words for each party, and therefore, it goes from -1 to 1.

We first plotted the similarity score analysis to show how republicans and democrats have been diverging in their speeches. While early presidents typically used more general common terms, such as people, citizens, consitution, etc. Mordern presidents have diverged. Modern republicans tend to speak more about terrorism, fear, and war; while modern democrats tend to focus on terms like welfare, jobs, health care, and so on.

It is worth noting the low similarity score observed during the World War periods.The reason of this difference is that democrat presidents had to deal with both World Wars, while Republican presidents dealt with a relatively prosperous 20s decade. Therefore, their speeches were radically different.

With the hope of further understanding the differences between parties in different time periods, we decided to show comparisons from the first and the last periods: A New Nation and Modern America.

Below, we show a 3 analysis for each one of the two periods.

  1. We generated a wordcloud for the period for each party (Republican in red, and Democrat in Blue)
  2. We printed the 20 words that best “characterize” a party for that period. To calculate this, we subtracted the relative frequence vector of the democrats, to the relative frequence vector of the republicans. Thus, positive scores indicate “republicanism”, and negative scores indicate “democratism”
  3. We trained a Markov chain model to our text to generate a predictive text for each of the periods/ party combinations.

1789 - 1860 New Nation
Similarity score: 0.93

Top words Republican Score Top words Democrat Score
great 0.0016912 upon -0.0023555
millions 0.0013367 people -0.0022939
nation 0.0013179 mexico -0.0020358
last 0.0012010 constitution -0.0019729
british 0.0010091 public -0.0014277
spain 0.0008060 president -0.0012999
commerce 0.0008033 bank -0.0012531
nations 0.0008010 general -0.0011208
view 0.0007961 power -0.0010619
improvement 0.0007842 money -0.0010279
consideration 0.0007269 banks -0.0010138
course 0.0006950 state -0.0010076
peace 0.0006660 federal -0.0009782
progress 0.0006492 duty -0.0008222
thought 0.0006437 mexican -0.0008132
militia 0.0006262 thus -0.0008050
session 0.0006239 question -0.0007280
establishment 0.0006038 character -0.0007250
tribes 0.0005943 present -0.0007201
several 0.0005903 republic -0.0007178

4.0.1 Generated speech

New Nation, Democrat:

The remedial policy, the principles and policy of augmenting the military defenses recommended by every branch of the precedent of the President shall exercise his own Government, and that for that mutual good will of those sales during the last session of Congress, the next fiscal year of $404,878.53, or more propriety than the public works, plant schools throughout their Territorial existence, and would foster a system of discriminating and countervailing duties necessarily produces. The selection and of personal communication with California.

New Nation, Republican:

While dwelling with pleasing satisfaction upon the general impulse required for the want of an act of December last that instructions had been anticipated as Spain must have known that the expedition having been fully accomplished. The basis of action in public offices is established by those who promoted and facilitated by the laws on the 30th of April 29th, 1816, was the destiny of nations. The question, therefore, whether it should be enabled to judge of the other by partial agreement.

(2002 -2016) Modern America
Similarity score: 0.78

Top words Republican Score Top words Democrat Score
great 0.0016912 upon -0.0023555
millions 0.0013367 people -0.0022939
nation 0.0013179 mexico -0.0020358
last 0.0012010 constitution -0.0019729
british 0.0010091 public -0.0014277
spain 0.0008060 president -0.0012999
commerce 0.0008033 bank -0.0012531
nations 0.0008010 general -0.0011208
view 0.0007961 power -0.0010619
improvement 0.0007842 money -0.0010279
consideration 0.0007269 banks -0.0010138
course 0.0006950 state -0.0010076
peace 0.0006660 federal -0.0009782
progress 0.0006492 duty -0.0008222
thought 0.0006437 mexican -0.0008132
militia 0.0006262 thus -0.0008050
session 0.0006239 question -0.0007280
establishment 0.0006038 character -0.0007250
tribes 0.0005943 present -0.0007201
several 0.0005903 republic -0.0007178

4.0.2 Generated speech

Modern, Democrat:

We ought to be a tough economy. I vetoed that proposal to Congress comprehensive legislation that will cover the uninsured, strengthen Medicare for older Americans. Every plan before the Congress to support what works and greater energy independence. We need to ultimately make clean, renewable energy in history, with the people who are behind to catch criminals and drug abuse and heroin abuse. So, who knows, we might perfect our Union. And despite all our children futures to say to those beyond our shores. Right now it helps about half of all children who lose their health care. Forty million Americans without health insurance industry from exploiting patients.

Modern, Republican:

Our country must also act now because it means the most important institutions – a symbol of quality and progress, And where every one who has a new century, your century, on dreams we cannot see, on the offensive by encouraging economic growth, and reforms in education and support the training and launch a major al-Qaida leader in Yemen. All told, more than 3,000 suspected terrorists have chosen the weapon of fear. Some speak of an American tradition to show a certain skepticism toward our democratic institutions.

5 Entity Network Analysis

Lastly, to get an overall summary of the United States presidents’ speeches over the course of history, we did a word network analysis using the whole data to try to group different entities and their relationship among each other.

For this analysis we construct a word netork \(G=(E,V)\) with weights \(w_{ij}\) in the following way:

  1. Each entity is represented by a node \(i \in V\)
  2. If two entities \(i\), \(j\) are mentioned in the same speech, we add an edge \((i,j) \in E\) between both entities.
  3. The weight of an edge \(w_{ij}\) is determined by the number of speeches where both, \(i\) and \(j\), were mentioned together.

After constructing the network, we have the following structure:

  1. 2847 nodes
  2. 139,309 edges
  3. Weights:
Min. 1st Qu. Median Mean 3rd Qu. Max.
1 1 1 1.596 1 215

To find which terms are related among themselves, we used the Louvain Community Detection Algorithm. The entities belonging to the same group are more related to the ones within their group than the ones outside their group. Using the default modularity parameter of 1.0, we found 7 groups.

The following graph shows the word network obtained:

[entities_network]

The size of the nodes represent the degree; the size of the label, the eigenvector centrality and the color, the modularity class.

It’s useful to zoom in into the graph to analyze some terms or groupings in particular. For example, we can see interesting cases like the following two:

The first iunteresting case of the entities network, if we focus on the green nodes, we can appreciate that most of the correspond to World War II war army terms such as Army Service Forces, German Army, Japanese Fleet, Air Force, etc.

The following is another interesting example. We can appreciate that the blue nodes in the middle and the purple nodes nearby also have war related terms. The blue ones are more related to World War I like Austria Hungary, Balkan peninsula and League of Nations, while the purples are about other war terms related to places or to characters like Pearl Harbor, Hitler, Mr. Churchill, etc.

The advantage of this kind of analysis is that it not only highlights the most important entities, but also finds relations between them and identify which of them are similar.

The full resolution version of the network is not inlcuded here because it’s heavy, but you can find it in the following link Full word network.

6 Conclusion

The brief history of the United States of America has been full of interesting events that have shaped, not only the people in the country, but also from the rest of the world.

Presidential speeches contain information that reflects the general sentiment of the country, and even the world. On one hand, partisanship ideologies are completely embedded into presidential speeches, allowing us to analyze the broad prevailing ideologies by looking at presidential speeches. On the other hand, humankind milestones such as World Wars and economic recessions have had deep effects on people’s moral, and presidential speeches are reflective of such reality.